53 research outputs found
The sequence matters: A systematic literature review of using sequence analysis in Learning Analytics
Describing and analysing sequences of learner actions is becoming more
popular in learning analytics. Nevertheless, the authors found a variety of
definitions of what a learning sequence is, of which data is used for the
analysis, and which methods are implemented, as well as of the purpose and
educational interventions designed with them. In this literature review, the
authors aim to generate an overview of these concepts to develop a decision
framework for using sequence analysis in educational research. After analysing
44 articles, the conclusions enable us to highlight different learning tasks
and educational settings where sequences are analysed, identify data mapping
models for different types of sequence actions, differentiate methods based on
purpose and scope, and identify possible educational interventions based on the
outcomes of sequence analysis.Comment: Submitted to the Journal of Learning Analytic
Impact of annotation modality on label quality and model performance in the automatic assessment of laughter in-the-wild
Laughter is considered one of the most overt signals of joy. Laughter is
well-recognized as a multimodal phenomenon but is most commonly detected by
sensing the sound of laughter. It is unclear how perception and annotation of
laughter differ when annotated from other modalities like video, via the body
movements of laughter. In this paper we take a first step in this direction by
asking if and how well laughter can be annotated when only audio, only video
(containing full body movement information) or audiovisual modalities are
available to annotators. We ask whether annotations of laughter are congruent
across modalities, and compare the effect that labeling modality has on machine
learning model performance. We compare annotations and models for laughter
detection, intensity estimation, and segmentation, three tasks common in
previous studies of laughter. Our analysis of more than 4000 annotations
acquired from 48 annotators revealed evidence for incongruity in the perception
of laughter, and its intensity between modalities. Further analysis of
annotations against consolidated audiovisual reference annotations revealed
that recall was lower on average for video when compared to the audio
condition, but tended to increase with the intensity of the laughter samples.
Our machine learning experiments compared the performance of state-of-the-art
unimodal (audio-based, video-based and acceleration-based) and multi-modal
models for different combinations of input modalities, training label modality,
and testing label modality. Models with video and acceleration inputs had
similar performance regardless of training label modality, suggesting that it
may be entirely appropriate to train models for laughter detection from body
movements using video-acquired labels, despite their lower inter-rater
agreement
Context Cues For Classification Of Competitive And Collaborative Overlaps
Being able to respond appropriately to users’ overlaps should be seen as one of the core competencies of incremental dialogue systems. At the same time identifying whether an interlocutor wants to support or grab the turn is a task which comes naturally to humans, but has not yet been implemented in such systems. Motivated by this we first investigate whether prosodic characteristics of speech in the vicinity of overlaps are significantly different from prosodic characteristics in the vicinity of non-overlapping speech. We then test the suitability of different context sizes, both preceding and following but excluding features of the overlap, for the automatic classification of collaborative and competitive overlaps. We also test whether the fusion of preceding and succeeding contexts improves the classification. Preliminary results indicate that the optimal context for classification of overlap lies at 0.2 seconds preceding the overlap and up to 0.3 seconds following it. We demonstrate that we are able to classify collaborative and competitive overlap with a median accuracy of 63%
Who Will Get the Grant ? A Multimodal Corpus for the Analysis of Conversational Behaviours in Group
In the last couple of years more and more multimodal corpora have been created. Recently many of these corpora have also included RGB-D sensors' data. However, there is to our knowledge no publicly available corpus, which combines accurate gaze-tracking, and high- quality audio recording for group discussions of varying dynamics. With a corpus that would fulfill these needs, it would be possible to investigate higher level constructs such as group involvement, individual engagement or rapport, which all require multi-modal feature extraction. In the following paper we describe the design and recording of such a corpus and we provide some illustrative examples of how such a corpus might be exploited in the study of group dynamics
The Influence of Syntactic Boundaries on Place Assimilation in German
Oertel C, Windmann A. The Influence of Syntactic Boundaries on Place Assimilation in German. Presented at the GLOW 33, Wroclaw, Poland
Modelling Engagement in Multi-Party Conversations : Data-Driven Approaches to Understanding Human-Human Communication Patterns for Use in Human-Robot Interactions
The aim of this thesis is to study human-human interaction in order to provide virtual agents and robots with the capability to engage into multi-party-conversations in a human-like-manner. The focus lies with the modelling of conversational dynamics and the appropriate realization of multi-modal feedback behaviour. For such an undertaking, it is important to understand how human-human communication unfolds in varying contexts and constellations over time. To this end, multi-modal human-human corpora are designed as well as annotation schemes to capture conversational dynamics are developed. Multi-modal analysis is carried out and models are built. Emphasis is put on not modelling speaker behaviour in general and on modelling listener behaviour in particular. In this thesis, a bridge is built between multi-modal modelling of conversational dynamics on the one hand multi-modal generation of listener behaviour in virtual agents and robots on the other hand. In order to build this bridge, a unit-selection multi-modal synthesis is carried out as well as a statistical speech synthesis of feedback. The effect of a variation in prosody of feedback token on the perception of third-party observers is evaluated. Finally, the effect of a controlled variation of eye-gaze is evaluated, as is the perception of user feedback in human-robot interaction.QC 20161214</p
A Gaze-based Method for Relating Group Involvement to Individual Engagement in Multimodal Multiparty Dialogue ABSTRACT
This paper is concerned with modelling individual engagement and group involvement as well as their relationship in an eight-party, mutimodal corpus. We propose a number of features (presence, entropy, symmetry and maxgaze) that summarise different aspects of eye-gaze patterns and allow us to describe individual as well as group behaviour in time. We use these features to define similarities between the subjects and we compare this information with the engagement rankings the subjects expressed at the end of each interactions about themselves and the other participants. We analyse how these features relate to four classes of group involvement and we build a classifier that is able to distinguish between those classes with 71 % of accuracy
Context cues for classification of competitive and collaborative overlaps
Oertel C, Wlodarczak M, Tarasov A, Campbell N, Wagner P. Context cues for classification of competitive and collaborative overlaps. In: Proceedings of Speech Prosody 2012. 2012: 721-724
Automatic Prominence Annotation of a German Speech Synthesis Corpus: Towards Prominence-Based Prosody Generation for Unit Selection Synthesis
This paper describes work directed towards the development of a syllable prominence-based prosody generation functionality for a German unit selection speech synthesis system. A general concept for syllable prominence-based prosody generation in unit selection synthesis is proposed. As a first step towards its implementation, an automated syllable prominence annotation procedure based on acoustic analyses has been performed on the BOSS speech corpus. The prominence labeling has been evaluated against an existing annotation of lexical stress levels and manual prominence labeling on a subset of the corpus. We discuss methods and results and give an outlook on further implementation steps
- …